MUP - The UIC Standoff Markup Tool
نویسندگان
چکیده
Recently developed markup tools for dialogue work are quite sophisticated and require considerable knowledge and overhead, but older tools do not support XML standoff markup, the current annotation style of choice. For the DIAG-NLP project we have created a “lightweight” but modern markup tool that can be configured and used by the working NLP researcher.
منابع مشابه
Design of a Standoff Object-Oriented Markup Language (sooml) for Annotating Biomedical Literature
With the rapid growth of electronically available scientific literature, text mining is attracting increasing attention. While numerous algorithms, tools, and systems have been developed for extracting information from text, little effort has been focused on how to mark up the information. We present the design of a standoff, object-oriented markup language (called SOOML), which is simple, expr...
متن کاملLess Destructive Cleaning of Web Documents by Using Standoff Annotation
Standoff annotation, that is, the separation of primary data and markup, can be an interesting option to annotate web pages since it does not demand the removal of annotations already present in web pages. We will present a standoff serialization that allows for annotating wellformed web pages with multiple annotation layers in a single instance, easing processing and analyzing of the data.
متن کاملRepresenting and Querying Standoff XML
The paper discusses the representation and exploitation of multi-level annotated linguistic data. We first present a standoff XML representation, which distributes information over separate, standoff layers and allows us to represent annotations of various kinds in a uniform, generic way. This format serves as our interchange format. We further introduce an XML-inline representation that is des...
متن کاملMultidimensional markup and heterogeneous linguistic resources
The paper discusses two topics: firstly an approach of using multiple layers of annotation is sketched out. Regarding the XML representation this approach is similar to standoff annotation. A second topic is the use of heterogeneous linguistic resources (e.g., XML annotated documents, taggers, lexical nets) as a source for semiautomatic multi-dimensional markup to resolve typical linguistic iss...
متن کاملExtending standoff annotation
Information encoding is often complex. Textual information is sometimes accompanied by additional encodings (such as visuals). These multimodal documents may be interesting objects of investigation for linguistics. Another class of complex documents are pre-annotated documents. Classic XML inline annotation often fails for both document classes because of overlapping markup. However, standoff a...
متن کامل